Conversation
Three cross-cutting changes plus correctness corrections surfaced in review: 1. UTC buckets, local-TZ working patterns. All cross-commit aggregations (monthly/weekly/yearly buckets, active days, first/last dates, trend cutoff) now format with .UTC() so commits at the same UTC instant in different local offsets land in the same bucket. workGrid and the per-dev work grid intentionally keep local TZ — they describe the author's work rhythm, not UTC instants. 2. Rename tracking via finalize-time reconciliation. git log is newest-first, so rename edges (status R*) are captured during stream and resolved in a second pass. File entries, coupling pairs/denom, and contributor file sets are re-keyed onto canonical (latest) paths. Chains A→B→C collapse. Self-pairs are dropped. Cycle guard prevents infinite loops on degenerate A→B→A. Validated on pi-hole, praat, WordPress, and kubernetes (merge counts align with rename events). 3. Churn-risk reframed from a single risk_score to classification labels: cold / active / active-core / silo / legacy-hotspot. Ranking is now by recent_churn desc (tiebreak: lower bus factor). The composite risk_score field is retained for CI gate back-compat but flagged in docs as divergent from the label semantics. Honesty corrections: - DirectoryStats.Commits renamed to FileTouches. Prior name implied distinct commits but the value was the sum of per-file commit counts (one commit touching N files contributes N). Breaking change for consumers reading JSON/CSV by field name. - DeveloperNetwork adds SharedLines = Σ min(linesA, linesB) across shared files. Primary sort is now SharedLines desc. SharedFiles kept as legacy field. Trivial one-line overlaps no longer dominate. - Classification thresholds extracted to named constants (classify*, contrib*). - extract warns when the repo has .mailmap but --mailmap was not set. - docs/METRICS.md gains caveats on: --mailmap default, --ignore desync between commit-level and file-level totals, TZ spoofing in working patterns, --since + firstChange interaction, classification degenerate edges (cycle, single-file, identical-churn). - README churn-risk example uses real pi-hole output (one sample per label). CI gate section flags --fail-on-churn-risk as legacy composite. New tests: TestActivityBucketUsesUTC, TestChurnTrend stability, TestRenameMergesHistory, TestRenameChain, TestRenameCollapsesCouplingSelfPair, TestRenameCycleDoesNotCrash, TestRenameDevFilesTouchedUsesCanonical, TestRenameDedupPrefersChronologicallyNewest, TestClassifyFile, TestDeveloperNetworkSharedLinesWeighsRealOverlap. All existing tests pass. go vet clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Go map iteration is randomized and sort.Slice is unstable, so any ranking function without an explicit tiebreaker produces different orderings on every invocation when the primary sort key ties. Ties on integer keys (BusFactor=1, Commits=1 on patchwork activity) are ubiquitous in real repos — this is why the CLI JSON output and the HTML report were showing different top-N files for the same dataset. Six functions gained a stable secondary key: BusFactor bus_factor asc, then path asc FileHotspots commits desc, then path asc TopContributors commits desc, then email asc DirectoryStats file_touches desc, then dir asc TopCommits lines_changed desc, then sha asc DevProfiles commits desc, then email asc Three functions already had tiebreakers (ChurnRisk, FileCoupling, DeveloperNetwork) — left alone. Tests: - TestAllSortsDeterministicUnderTies: table-driven, runs each of the six sort functions 20 times on a fixture where every primary sort key ties (6 files at bf=1, 6 devs at commits=1, 3 dirs at file_touches=2, etc.). Fails if any iteration produces a different ordering. - TestBusFactorOrderDeterministicUnderTies: focused regression on the original finding, also asserts path-asc tiebreaker. Sanity-checked that the fixture actually triggers non-determinism when a tiebreaker is removed (ran BusFactor without tiebreaker across 50 iters and observed divergent orderings). The determinism test is not vacuous. Validated CLI JSON vs HTML report coherence across pi-hole, praat, WordPress, kubernetes — all four now produce identical top-N orderings between the two outputs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Every ranking function now has an explicit tiebreaker (prior commit 2b20ef0). Surface this as a user-visible property in METRICS.md so consumers parsing JSON output can rely on stable ordering, and so the CLI/HTML coherence guarantee is explicit. One table listing all stats with primary key + tiebreaker, placed in the Behavior and caveats section where other guarantees (timezone, renames) are documented. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The seven values that govern churn-risk classification and dev profile contribType are all exported as package constants (classify*, contrib* in internal/stats/stats.go). The rules tables in METRICS.md inlined the numeric defaults but did not link them to the constant names or tell contributors where to look to re-calibrate. Add a Thresholds subsection in Behavior and caveats listing all seven with default values and what each controls, and note that overrides require a rebuild — no runtime flag exists yet. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Commit count is a flawed proxy for contribution: squash vs merge workflows, bot committers (k8s-ci-robot alone holds 20k commits on kubernetes), and per-dev commit style all distort the signal. The existing "80% of commits" card amplifies these distortions. Rather than replace it (substituting a churn-only lens would trade one bias — frequent committers — for another — verbose authors and generated-file writers), surface both. ParetoData gains DevsPct80Churn and TopChurnDevs computed by re-sorting contributors on additions+deletions with an email-asc tiebreaker. Divergence between the two cards is the actual signal: - aligned cards (pi-hole: 17 ≈ 17) → consistent contributor base - commits ≫ churn (k8s: 267 vs 132) → bots or merge-squash inflation - churn ≫ commits would indicate single heavy-feature authors HTML report adds a second Devs card below the commits one with a hint explaining how to read the divergence. CLI Pareto output splits the Devs line into "Devs (commits)" and "Devs (churn)". TestComputePareto updated to assert the new field is populated and percentages stay in [0, 100]. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two latent bugs surfaced in review of the dual-lens Pareto commit: 1. When total commits (or total churn) is zero across all contributors, the 80% threshold is zero too. The first loop iteration then tripped `cumulative >= 0` and incremented TopCommitDevs / TopChurnDevs to 1 before breaking. Result: "1 dev produces 80% of 0 churn" — which then passed through the guards and rendered in the HTML card alongside a 0% judgment. Visible only on pathological datasets (all-merge repos, empty-commit streams) but non-trivial to diagnose when it occurs. Guard by skipping the loop entirely when the aggregate is zero. Apply to both the commits path (pre-existing latent bug) and the new churn path. 2. The HTML card for the churn lens used `eq .Pareto.TotalDevs 0` as the no-data signal, mirroring the sibling cards. But with contributors present and zero churn, TotalDevs > 0 while the metric itself has no signal — the 🔴 "extremely concentrated" branch fired for 0%. Switch the churn card to check `TopChurnDevs 0` directly, which after fix #1 correctly stays at 0 in the zero-churn case. Tests: - TestComputeParetoDivergenceBotVsAuthor: 100 tiny bot commits + 3 big human commits. Asserts both lenses identify 1 dev as the 80% owner — the distinction is which dev each lens picks, confirmed by exercising both code paths independently. - TestComputeParetoZeroChurn: all-empty commits (0 adds, 0 dels). Asserts TopChurnDevs stays 0 rather than leaking to 1. This test fails before fix #1, passes after. Also trimmed the long inline comment on DevsPct80Churn in the struct and added a brief comment on the defensive `copy(byChurn, contribs)` to explain why we duplicate rather than mutate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Pareto section in METRICS.md described devs as a single lens sorted by commit count. After commit b1861f0 (churn lens added), the section was stale — it omitted the second lens, the rationale (commits are biased by squash/bot workflows), and how to read divergence between the two cards. Rewrite the calculation table with three rows for the dev lenses, add a "Reading the divergence" block with concrete examples from the validated repos (pi-hole aligned, kubernetes 267 vs 132), and note that the no-data marker can now fire on either the commits or churn aggregate independently. Also add `pareto` to the stats table in README — pre-existing omission; the stat was reachable but undocumented in the command reference. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-change in the same commit was counted as coupling regardless of the commit's character. A global rename or format pass touching 30 files generated 435 spurious pairs per commit, drowning the real signal from focused multi-file changes. Add a heuristic in flushCoupling: when a commit touches ≥ refactorMinFiles (10) with mean per-file churn < refactorMaxChurnPerFile (5 lines), treat it as mechanical and skip pair counting. The individual file-change counts (couplingFileChanges, surfaced as ChangesA/ChangesB in the output) are still incremented — only the pair accumulation is suppressed. This preserves honest "how often does this file change" totals while removing the combinatorial noise. Tests: - TestCouplingExcludesMechanicalRefactor: a commit with 10 files × 1 line each must NOT contribute pairs; a normal multi-file commit on the same files must; and file-change denominators must include both commits (honest totals). - TestCouplingKeepsNormalMultiFileCommits: counter-test with 10 files × 50 lines must still produce all 45 pairs — the filter only catches the low-churn pattern. METRICS.md coupling section updated with the refactor-filter rule, a "co-change is not causation" caveat, and the two new thresholds in the constants table. Validated on pi-hole, praat, WordPress, kubernetes — top coupling pairs remain sensible (test ↔ impl, .cpp ↔ .h, generated types ↔ sources). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The refactor filter added in f94b0af had a test for the main happy path and a counter-test for normal commits, but nothing at the threshold boundaries or for the case that originally motivated the feature. Add TestCouplingRefactorFilterBoundaries — table-driven, six cases: - 9 files zero churn → NOT filtered (below refactorMinFiles) - 10 files zero churn (pure rename pattern) → filtered - 10 files × 4 lines (mean 4.0) → filtered - 10 files × 5 lines (mean 5.0 exactly) → NOT filtered (strict `<`) - 10 files × 6 lines (mean 6.0) → NOT filtered - 20 files × 50 lines (substantial refactor) → NOT filtered Each case also asserts that couplingFileChanges (the denominator) is incremented for every file regardless of filter outcome — the honest- totals invariant was implicit before. Also document a real limitation in METRICS.md: a commit mixing many low-churn renames with a few high-churn substantive edits can have a mean that escapes the filter. Per-file weighting would fix it but requires restructuring pair generation; flagged as an acknowledged limitation rather than a planned change so readers know the blast radius of the current heuristic. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Four low-priority items from the review of f94b0af: 1. Extract isMechanicalRefactor(fileCount, commitChurn) helper. The filter condition was inlined in flushCoupling, mixing the check with the early-return; a named predicate reads cleaner and is directly callable from tests if needed later. 2. Clarify the refactorMaxChurnPerFile constant comment: the unit is line-churn (additions + deletions), and the comparison is strict `<`, so mean exactly 5.0 is NOT filtered. The constant name alone suggested "5 lines" without specifying which count. 3. Add TestCouplingRenameBelowFilterThreshold pinning down the interaction between rename tracking and the refactor filter: a commit renaming 8 files (below refactorMinFiles) leaks pairs to the canonical path space. CoChanges = 1 per pair, so sensible `--coupling-min-changes` thresholds hide them, but raw ds.couplingPairs contains them. The test exists so any future tightening of the filter is a conscious, test-backed change. 4. Document three caveats in METRICS.md that were implicit: - rename commits below the refactor-filter fileCount threshold leak pairs (current behavior, validated by #3 above) - `--coupling-max-files` below 10 runs first and makes the refactor filter dead code — correct layering but non-obvious Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three issues found in a deeper audit of stats calculations: 1. DevProfiles inner sorts (TopFiles, Scope, Collaborators) had no tiebreakers. The outer profile list got one in 2b20ef0 but the three slices *inside* each profile remained non-deterministic on ties. A dev with many files at equal churn would show different top-10 between CLI and HTML, and between two CLI runs. Add path / dir / email asc tiebreakers to all three. 2. churnTrend returned 2 ("growing from nothing") for every file when the dataset span is shorter than the trend window. Under a tight --since filter (say 1 month against a 3-month window), no file can have an "earlier" bucket, so the baseline was empty → all files falsely flagged as growing. Add an earliest-date parameter and short-circuit to 1 (no signal) when earliestKey >= cutoffKey. 3. ChurnRisk, FileCoupling, and DeveloperNetwork had primary + secondary sort keys but no final tiebreaker. When both ties — not uncommon, since BusFactor and SharedFiles are small integers — the order depends on Go map iteration. Add path / file-pair / dev-pair asc as the tertiary tiebreaker. Tests: - TestDevProfilesInnerSortsDeterministic: crafted dataset with ties on churn (TopFiles), file count (Scope), and shared-files (Collaborators). Runs DevProfiles 20x and asserts identical inner slices each time, plus the specific asc orderings. - TestChurnTrend extended with a short-span case: two months of data, 3-month window. Previously returned 2, must now return 1. - Existing TestChurnTrend cases updated to pass an earliest parameter that is well before the window (no behavioral change for them). Validated on pi-hole and kubernetes: two consecutive stats runs produce byte-identical churn-risk output. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
DeveloperNetwork was upgraded in b1861f0 to use SharedLines = Σ min(linesA, linesB) as the primary overlap signal, rejecting the trivial-touch bias of a pure file-count approach. DevProfile.Collab- orators kept the old SharedFiles-only semantics, so the two views over the same developer-to-developer relationship disagreed: devs with many one-line touches on shared files ranked highly on a profile but not on the network. Close the gap: DevCollaborator gains SharedLines (same formula), primary sort becomes SharedLines desc with SharedFiles as the secondary and Email as the tertiary tiebreaker. HTML templates (repo-level profile card and standalone profile report) now show both counts so the honest signal is visible. Tests: - TestDevProfilesCollaboratorsSharedLines: Alice co-edits two files with Bob (trivial: 1 line each file) and Carol (substantive: 500 lines each). SharedFiles ties at 2 for both; SharedLines correctly ranks Carol (1000) over Bob (2). - TestFileCouplingTertiaryTiebreaker, TestDeveloperNetworkTertiary- Tiebreaker, TestChurnRiskTertiaryTiebreaker: craft datasets where both primary and secondary sort keys tie, assert deterministic asc ordering on the tertiary key. These cover the tertiary tiebreakers added in 557f6c7 that had no dedicated test. Validated on WordPress: `ryan@...` top collaborator shifted from `azaozz` (567 files, 118k lines) to `wp@andrewnacin` (467 files, 131k lines) — the new ordering correctly weights real overlap, matching what DeveloperNetwork already showed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two stale descriptions in METRICS.md after recent commits: 1. The Profile Collaborators row said "Top 5 devs sharing the most files with this dev", which described the old SharedFiles-only ranking. After 876341f the ranking is by shared_lines with the same Σ min(linesA, linesB) semantics as Developer Network. Rewrite the row to match the actual calculation and cross-link to the Developer Network section. 2. The Reproducibility table listed primary + secondary keys per stat but said nothing about the tertiary tiebreakers added in 557f6c7, nor about the deterministic ordering of profile sub-lists (TopFiles, Scope, Collaborators) fixed in the same commit. Add a short note after the table covering both. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e1f2e9e4be
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
A user report flagged that `stats --stat pareto` on zero-churn datasets printed "extremely concentrated" for the Devs (churn) line instead of "no data". The fix for zero-churn devs landed in 3270090, but that commit scoped the guard to the two devs code paths and left three siblings stale: 1. cmd/gitcortex/main.go — the CLI was calling judge() with p.TotalDevs / p.TotalFiles / p.TotalDirs as the zero-check. Those are non-zero as long as the dataset has any contributors / files / directories, regardless of whether the churn/commits signal is present. Passing p.TopCommitDevs, p.TopChurnDevs, p.TopChurnFiles, and p.TopChurnDirs instead makes the "no data" branch fire exactly when the corresponding signal is absent. 2. internal/report/report.go — ComputePareto's Files and Dirs 80% loops had the same zero-threshold-tripped-on-first-iteration bug I already fixed for devs. A merge-only or pure-rename dataset (non-empty ds.files with all zero churn) used to leave TopChurnFiles = 1 and TopChurnDirs = 1 after the loop, with percentages of 1/TotalFiles and 1/TotalDirs rendered as "extremely concentrated". Add `if totalChurn > 0` / `if totalDirChurn > 0` guards mirroring the devs fix. 3. internal/report/template.go — the HTML Files and Dirs cards used `eq .Pareto.TotalFiles 0` / `eq .Pareto.TotalDirs 0` as the no-data signal, same bug class as the devs card I fixed in 3270090 but that I missed for the other two dimensions. Switch them to `eq .Pareto.TopChurnFiles 0` / `eq .Pareto.TopChurnDirs 0`. The commits-devs card also gets the same treatment (`eq .Pareto.TopCommitDevs 0`) for consistency with the churn-devs card I fixed in 3270090. Tests: - TestComputeParetoFilesAndDirsZeroChurn: JSONL with two files, both commit_file records having additions=0 and deletions=0. TotalFiles must be 2 (the files exist), but TopChurnFiles, FilesPct80Churn, TopChurnDirs, and DirsPct80Churn must all be 0 (no signal). Validated on kubernetes: Pareto output identical to pre-fix on real data (guard only fires on the zero path). CLI on the synthetic zero-churn dataset now prints "no data" on Files, Devs (churn), and Dirs lines — no false alarms. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2db4d8da88
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
The prior applyRenames logic deduplicated rename edges with the same
oldPath by keeping the first-seen (newest, given newest-first JSONL
order) and discarding the rest. This silently misattributed history
whenever a repo reused a path name:
t1: A exists, is renamed to B
t2: a new unrelated file is created at "A"
t3: that new A is renamed to D
JSONL yields edges [{A→D}, {A→B}]. ds.files["A"] at finalize time
already contains commits from BOTH lineages (the path key can't hold
them separately), so any migration spreads one lineage into the wrong
target. The old dedup sent ds.files["A"] to D, leaving B without its
pre-rename history and D polluted with commits that belong to B.
Fix: pre-count oldPath occurrences and skip migration entirely for
any oldPath that appears more than once. Without per-commit temporal
tracking we cannot disambiguate lineages, and underattribution (both
targets miss their pre-rename history, and ds.files["A"] stays put
with the merged-at-ingest-time lineages) is strictly safer than
misattribution into the wrong target.
A full fix requires oldest-first stream processing so ds.files["A"]
can be forked into separate entries at each rename event. That is a
larger architectural change; flagged as a known limitation in
METRICS.md.
Tests:
- TestRenameSkipsReusedOldPath: replaces the prior
TestRenameDedupPrefersChronologicallyNewest, which asserted the
exact behavior that was wrong. Same dataset shape; new assertions
verify A remains in place and neither B nor D receives A's data.
- TestRenameChainNotMistakenForReuse: guards against over-correction
— a genuine A→B→C chain has distinct oldPaths, must still merge.
Validated on four real repos: TotalFiles ticked up across all of
them (pi-hole +4, praat +2, WordPress +27, kubernetes +81),
corresponding to lineages that were previously being misattributed
and now stay put.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b8e578549f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
The skip-if-duplicate rule from b8e5785 was too aggressive: it also silenced legitimate rename-back flows like A → B → A → C, where the repeated A is the same lineage recreated by the intermediate rename B → A. Under the old behavior both A-edges got dropped, so the chain stopped collapsing and B's history sat orphaned on a stale path instead of rolling up into C. Refine the detection: an oldPath is treated as reuse only when its count is > 1 AND no edge has it as a newPath. The second condition catches the rename-back pattern — a duplicate oldPath that is itself the target of another edge belongs to a chain, not to two unrelated lineages. The heuristic is exact for these two common cases (pure reuse, rename-back chain) and imperfect on exotic mixed cases (e.g. A→B, C→A, A→D) which are now flagged as chains and will misattribute one lineage — a known limitation documented in METRICS.md. Tests: - TestRenameBackThenRenameAgain: crafted chain A → B → A → C with activity on all three names. Asserts the whole chain collapses into C and A/B disappear from ds.files. Previously failed because both A edges were skipped; now passes. - TestRenameSkipsReusedOldPath: unchanged — pure reuse (A→B, A→D with no B→A in between) still skips, since hasIncomingRename[A] stays false. - TestRenameChainNotMistakenForReuse: unchanged — A→B→C has no duplicates so neither path triggers the reuse branch. Validated on four real repos. Compared to the skip-all version: - pi-hole 211 (unchanged — 4 real path reuses, 0 rename-backs) - praat 10707 (−2 chains now collapse correctly) - WordPress 8106 (−27 chains now collapse correctly) - kubernetes 75839 (−65 rename-back chains collapse, 16 path reuses remain preserved) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Four polish items from review of 90af977: 1. applyRenames now explicitly sorts ds.renameEdges by commitDate desc before the dedup pass. Previously the "first seen wins" rule silently depended on git log emitting newest-first; if the extractor ever switched to --reverse or a future format change reshuffled the stream, dedup would flip to "oldest wins" and chain collapses would resolve in the wrong direction. The sort makes the invariant explicit rather than implicit. Adds a commitDate field to renameEdge, captured from ds.commits[SHA]. Zero behavior change on current data (pi-hole 211, praat 10707, WordPress 8106, kubernetes 75839 — identical to pre-sort). 2. Rename `hasIncomingRename` → `isRenameTarget` and extract the reuse-detection condition into named booleans `isDuplicate` / `wasRecreated` for readability. The original single-line expression with a negation required mental inversion to parse. 3. New TestRenameMixedLineageKnownLimitation pins the current (imperfect) behavior on the exotic case A→B then C→A then A→D. The heuristic can't tell this apart from a rename-back chain and misattributes lineage 1's pre-rename commits to D. The test documents the outcome so any future fix (proper temporal segmentation) updates it consciously rather than silently drifting to a different wrong answer. 4. TestRenameCycleDoesNotCrash now also asserts that both A.go and B.go survive the cycle bail — previously it only checked that streamLoad didn't return an error, which would have missed a bug that silently routed one file into the other. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The early-return guard `len(monthChurn) < 2` was too aggressive. It
forced trend=1 (no signal) for any file with churn concentrated in a
single month, silencing exactly the strongest classification inputs:
- earlier-only: a file touched only before the trend cutoff should
return 0 (declined to nothing) and flag old concentrated files as
legacy-hotspot. Under the old guard, these files stayed at silo.
- recent-only: a file touched only inside the trend window should
return 2 (grew from nothing). Functionally already reachable for
multi-month recent-only cases, but silenced for single-month ones.
Change the guard to `len == 0` — a truly empty map is the only case
that should short-circuit to neutral. Run the recent/earlier split
for any non-empty history. Added an explicit `if recent == 0 { return
0 }` branch for symmetry with the existing recent-only branch (the
division would already yield 0.0 naturally, but making the
"declined to nothing" case explicit matches the docstring and code
reads closer to the intent).
Tests: TestChurnTrend extended with two single-month cases, one old
(earlier-only, want 0) and one recent (recent-only, want 2). Both
failed under the old guard; both pass now.
Impact on real data is significant. Files touched once long ago and
never again were being mis-classified as silo; under the fix they
correctly become legacy-hotspot:
pi-hole silo 14 → 0 legacy-hotspot 49 → 63 (+14)
WordPress silo 333 → 305 legacy-hotspot 616 → 644 (+28)
kubernetes silo 6402 → 235 legacy-hotspot 22848 → 29021 (+6173)
The kubernetes shift is especially telling: a long-lived repo with
thousands of files touched once in an early commit and never again.
Those ARE legacy hotspots by the metric's own definition; the guard
was hiding them.
METRICS.md updated to describe the three edge-case return values
(1 empty, 2 recent-only, 0 earlier-only) and the short-span guard.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three review polish items from a48f240: 1. TestChurnRiskLegacyHotspotFromSingleOldMonth: end-to-end test asserting that a file with monthChurn={"2020-01":500}, bf=1, and age > 180d is classified as legacy-hotspot. The unit test (TestChurnTrend) checks churnTrend in isolation and TestClassifyFile checks classifyFile in isolation, but until now nothing verified the wiring inside ChurnRisk that connects them. If a future refactor accidentally passes the wrong arg to churnTrend (e.g. forgets ds.Earliest, or inverts the condition), this test catches it — the label would drop back to silo. 2. TestChurnTrend extended with a zero-value single-month case ({"2024-05":0}, possible when the only activity in a month is renames with no content change). Both buckets end up empty; defensive fallthrough must return 1 (no signal). Guards against a future refactor that treats zero-value entries as present. 3. METRICS.md Classification section gains a sensitivity note: files touched once long ago and never again are the *common* case on mature repos (~29k in kubernetes) rather than exceptional. If label output looks heavy on legacy-hotspot, that is usually diagnosing real dormant code — not a bug. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a3f0a91435
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
The reuse guard from 90af977 caught the multi-edge pattern (A→B and A→D with no intermediate B→A), but missed the more common single- edge case: A is renamed to B, then an unrelated file is later created at "A" and never renamed further. oldPathCounts[A] = 1, so the multi- edge heuristic passes it through, and the new file's history (mixed into ds.files["A"] at ingest time) migrates wholesale into B, corrupting hotspots, churn-risk, bus factor, coupling, and files-touched on both sides. Detect by temporal signature: if ds.files[oldPath].lastChange is after the rename's commitDate, activity continued on oldPath past the rename — a second lineage was created. Skip migration. Guard with !wasRecreated to avoid false positives on rename-back chains (A→B→A→C) and cycles (A↔B), where the later rename commit legitimately updates ds.files[oldPath].lastChange to its own date. In those cases wasRecreated=true (some edge has newPath=oldPath) and the single-edge check is disabled; the multi-edge heuristic or the dedup handles the decision instead. Both dates must be non-zero for the check to apply — defensive behavior when hand-constructed fileEntries in tests lack timestamps. Tests: - TestRenameSkipsSingleEdgeReusedOldPath: one edge A→B, ds.files["A"] with lastChange after the rename → migration skipped, A survives, B keeps only its pre-existing data. - TestRenameCleanRenameStillMigrates: one edge A→B, lastChange(A) before the rename → migration proceeds, A merges into B. Counter- test guarding against over-correction. - Existing TestRenameCycleDoesNotCrash had failed under a first version of the check (it missed the !wasRecreated guard); now passes because A is a rename target in a cycle. Validated on four real repos — the additional pattern is common: pi-hole 211 → 214 (+3 lineages preserved) praat 10707 → 10783 (+76) WordPress 8106 → 8393 (+287) kubernetes 75839 → 76402 (+563) Nearly a thousand files across the four repos that had their new lineages silently absorbed into renamed targets now stay distinct. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Tighten the single-edge reuse check to catch the chain+reuse case that the prior version (60a8f23) missed: D → A → B chain followed by a new unrelated file created at A. A is both a rename target (via D→A) and a rename source (via A→B), so wasRecreated[A] was true, which disabled the reuse detection entirely. Replace the wasRecreated guard with a maxEdgeDate comparison: for each path, track the latest commitDate across ALL edges where the path appears (source or target), and fire the reuse check when ds.files[path].lastChange is after that maximum. This bounds the check correctly for both: - clean chain A→B: maxEdgeDate[A]=t1, lastChange ≤ t1, no skip. - rename-back A→B→A→C: maxEdgeDate[A]=t3, lastChange ≤ t3, no skip. - cycle A↔B: maxEdgeDate[A]=max(t1,t2), lastChange = later rename's date, equal to max — no skip, cycle bail handles it. - clean single-edge reuse: maxEdgeDate[A]=t1, lastChange > t1, skip. - chain + reuse (D→A, A→B, new-A): maxEdgeDate[A]=t2, lastChange > t2, skip. Previously misattributed the new-A lineage into B. Tests: - TestRenameChainPlusReuse pins the newly-detected case: D→A then A→B followed by reuse at A. A stays, D migrates through its own edge, B keeps only its own data. This failed under the previous wasRecreated-guarded check. - TestRenameSingleEdgeReuseViaStream exercises the simple single-edge reuse through the full streamLoadInto → applyRenames pipeline, covering the wiring that a direct ds.renameEdges test skips. - TestRenameMixedLineageKnownLimitation gains a note explaining that its fixture intentionally omits lastChange so the reuse check does not fire; adding lastChange would turn it into a chain+reuse test with a different expected outcome. Documented a new rename caveat: `--since` filter only computes the reuse heuristics (oldPathCounts, isRenameTarget, maxEdgeDate) over edges inside the filter window, so a rename outside the window can change whether reuse is detected vs full-history extraction. Real-data impact across four repos — both directions: pi-hole 214 → 215 (+1) praat 10783 → 10781 (-2) WordPress 8393 → 8368 (-25) kubernetes 76402 → 76538 (+136) Some paths previously skipped under the pure-commitDate check now pass the stricter maxEdgeDate bar (fewer entries); other paths that the wasRecreated guard silenced are now correctly flagged (more entries). Net effect is positive where chain+reuse patterns dominate (kubernetes) and mixed where the repo's rename shape is simpler (WordPress, praat). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@codex review |
|
Codex Review: Didn't find any major issues. More of your lovely PRs please. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
The rename caveats section in METRICS.md listed only two shapes (multi- edge reuse and rename-back chain), but gitcortex now detects four distinct patterns after commits 60a8f23 and 317a141 added the temporal single-edge reuse check and the maxEdgeDate refinement for chain+reuse. Expand the caveat to enumerate all four with their exact detection rules and cite the kubernetes validation number (136 chain+ reuse files) so readers get a concrete sense of impact. Four patterns, one known limitation (mixed lineage) documented. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Re-measured extract, stats, and report on six repositories using a
pre-built binary. Added two new columns for the read-side operations
(`stats --format json` and `report --output`) so the asymmetry between
them is visible — the prior table only showed extract time.
Notable divergence on large repos: report runs substantially slower
than stats because DevProfiles builds per-dev profiles (O(D × C) work).
On kubernetes that is ~80 extra seconds over stats; on linux it is ~20
extra minutes. Flagged in the paragraph following the table so users
choosing between interactive stats queries and HTML report generation
know the trade-off.
Numbers:
pi-hole 7,077 / 281 extract=1.5s stats=0.18s report=0.27s
praat 10,221 / 19 extract=25s stats=0.96s report=1.0s
WordPress 52,466 / 131 extract=47s stats=2.9s report=3.0s
kubernetes 137,016 / 5,295 extract=2m4s stats=11.7s report=1m29s
chromium 1,319,124 (JSONL only; no bare clone in test env)
stats=20s report=2m4s
linux 6,078,056 (JSONL only; extract not re-run — too slow)
stats=1m15s report=21m24s
Dev counts for Pi-hole (286 → 281) and Praat (24 → 19) also corrected —
they were stale from a prior snapshot.
Extract numbers for Linux and Chromium removed: the bare clone for
Chromium is not available in the test env, and Linux was not re-run
to keep this measurement session under an hour.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The per-dev Collaborators accumulation was nested inside the outer loop over contributors — O(D × F × D_per_file) work. On kubernetes (5,295 devs × 75,823 files × ~3 devs-per-file avg) that's ~1.2 billion iterations. The HTML report, which builds profiles for every dev, spent most of its runtime here: 1m 29s on kubernetes vs 11.7s for the equivalent `stats --format json` path (which skips profiles). Pre-compute a single `allCollabs map[string]map[string]*collabAcc` in one pass over ds.files before entering the per-dev loop. For each file with ≥ 2 devs, enumerate unordered pairs and accumulate to both directions (a→b and b→a) so each dev's entry is O(1) to read later. Complexity drops to O(F × D_per_file²) = ~700k iterations on kubernetes, roughly three orders of magnitude less work. Inside the per-dev loop the 30-line inline computation is replaced with a three-line map lookup. Semantics unchanged: same min-per-file overlap for SharedLines, same SharedFiles count. Trade-off: peak memory holds one acc entry per ordered dev-pair that shares a file (~50 MB on kubernetes, ~150 MB on linux). Acceptable; flagged in the comment. Real-data speedups (same binary, same JSONLs): kubernetes 1m 29s → 14s (6.4×) chromium 2m 4s → 21s (5.9×) linux 21m 24s → 1m 53s (11.4×) Small repos (pi-hole, praat, WordPress) see no meaningful change — they were already <3 s and the old O(D × F × D_avg) was small enough to not matter. All 52 existing tests pass; spot-check on k8s-ci-robot's collaborator list confirmed identical values pre/post. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Formalize the allCollabs pre-compute optimization with two bench- marks: one for the full-dataset path (report generation), one for the filterEmail path (single-dev query). The synthetic dataset matches the shape of a mid-size repo — 1,000 devs, 10,000 files, 5 devs per file, 30,000 commits — large enough that the old O(D × F × D_avg) code would show a measurable delta but small enough to run in a couple of seconds per sample. Baseline numbers on the current machine (i7-9750H): BenchmarkDevProfilesAll 90.6 ms/op 34.3 MB/op 195k allocs BenchmarkDevProfilesFilterEmail 25.6 ms/op 1.3 MB/op 20k allocs These are ceiling/floor values to detect regression. Any future change that doubles allocations or runtime for the same dataset shape will show up immediately instead of silently making `report` slower on real repos. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The DevProfiles optimization in a207d2f collapsed report runtime on large repos from minutes to seconds. Rewrite the Performance section table with the new numbers and update the explanatory paragraph to match — the old paragraph framed report as slower than stats due to profile aggregation, which is no longer the story. New shape: report is now within a small constant multiple of stats across all six benchmarked repos (kubernetes ratio 14s/11.7s ≈ 1.2×, linux ratio 113s/75s ≈ 1.5×). The O(F × D_per_file²) pre-compute is cheap enough that building the full HTML dashboard is no longer meaningfully slower than running the aggregate. Also corrects stale developer counts from prior snapshots: Pi-hole 286 → 281, Praat 24 → 19. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@codex review |
|
Codex Review: Didn't find any major issues. Bravo. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Chromium was included in the previous refresh as a read-only row (stats and report timings only — the bare clone was not available in the test environment, so extract was dashed). Dropping it keeps every row in the table end-to-end measurable from a single machine, which is the clearer story. Re-ran extract on a fresh /tmp output for linux.git to replace the previous "—" placeholder: 12m 57s over 1,438,634 commits (≈1,850 commits/sec), in line with the pre-existing 13m 12s benchmark the original README had. All five remaining rows now show extract + stats + report timings on the same machine. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Line-count metrics treat a 50k-line generated.pb.go the same as a 50k-line hand-written module. Lock files regenerate on every dependency bump. Vendored deps inflate churn whenever updated. Minified JS, OpenAPI specs, protobuf outputs — all common, all distort hotspots, churn-risk, bus factor, coupling, dev-network, and profiles without reflecting human contribution. This is the biggest practical distortion in every stat today, and the existing --ignore flag is the mitigation. Add a dedicated README section between Performance and Privacy that: - States the problem with a concrete example from the kubernetes validation (top legacy-hotspots were all generated files) - Provides a starter --ignore set covering vendor/, node_modules/, dist/, build/, minified assets, lock files from five ecosystems (npm, yarn, Cargo, Go, Python), and *.pb.go / *_generated.go - Describes the iterative workflow: extract permissively, inspect hotspots/churn-risk, add --ignore entries, re-extract - Links back to the METRICS.md caveat that Summary totals do not re-sum after --ignore filtering Positioned early in the README (before Privacy, Install, Usage) so users running gitcortex for the first time on a real repo see the guidance before they get confused by vendor-dominated output. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
METRICS.md carried a Behavior-and-caveats subsection claiming that
commit-level totals (Summary.TotalAdditions/Deletions) were not
recomputed after --ignore filtering, so they would diverge from the
sum of hotspot churn. The same incorrect claim was then cited in the
new Vendor-and-generated-code section in README.md.
The code disagrees. internal/extract/extract.go:242-279 (emitCommit)
loops over commit.Raw, skips paths matched by --ignore, and computes
totalAdd/totalDel/filesCount as the sum of the non-ignored entries
before assigning them to the CommitInfo it writes to JSONL. The
commit record emitted to the stream therefore already reflects the
filter — downstream Summary totals match the file-level sums.
Remove the bogus caveat from METRICS.md and rewrite the corresponding
footnote in README.md to say the opposite: all totals stay
consistent after --ignore because extract recalculates before
writing. METRICS.md's Summary table row ("Recalculated when --ignore
is used") was already correct; the conflict between the two is
resolved in favour of the correct one.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
No description provided.